ABS Statistical Data Integration 


Statistical Data Integration * 


WHAT IS STATISTICAL DATA INTEGRATION? 


Statistical data integration involves bringing together data from different sources, at the unit level (i.e. for an individual person or organisation) or micro level 
(e.g. information for a small geographic area), to enable analysis of a combined set of information for statistical and research purposes. 


Data integration may also be referred to as data linkage. Data linkage describes one part of the integration process; where like data from two sources are 
matched or ‘linked’ to create a cohesive record for each unit common to datasets. More information about the linkage process is available in the Frequently 
Asked Questions section. 


WHY DOES THE ABS PERFORM STATISTICAL DATA INTEGRATION? 


The ABS recognises data as a valuable asset. Statistical data integration creates new opportunities to use existing data for research, reducing the need to 
collect additional information from people and organisations. A list of ABS data integration projects can be found on the Commonwealth Public Register of 
Data Integration Projects. 


ABS data integration activities are subject to the Commonwealth data integration arrangements. For more information about Commonwealth data integration 
initiatives and arrangements, see the National Statistical Service website. 


The ABS is an accredited Integrating Authority under the interim Commonwealth data integration arrangements. All data integration projects undertaken by the 
ABS are executed in a manner consistent with the High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research 
Purposes and with the requirements of the Cross Portfolio Data Integration Oversight Board. 


The ABS treats privacy obligations very seriously. All projects run by the ABS comply with the Privacy Act 1988, the Census and Statistics Act 1905 and the 
ABS Act 1975. The legislation under which the ABS operates prohibits the disclosure of identifiable information (Such as name, address, and date of birth 
information) of a personal or domestic nature under any circumstances. 


MORE INFORMATION: 

1. Frequently Asked Questions about ABS data integration 

2. Community Attitudes towards ABS statistical data integration 
3. ABS Integrating Authority Accreditation information 


4. Key facts about statistical data integration 

5. Challenges of statistical data integration 

6. Why choose the ABS? 

7. The ABS objectives in statistical data integration 

8. The Commonwealth Public Register of Data Integration Projects 
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FREQUENTLY ASKED QUESTIONS 


What is statistical data integration? 

Statistical data integration involves bringing together data from different sources, at the unit level (i.e. for an individual person or organisation) or micro level 
(e.g. information for a small geographic area), to enable analysis of a combined set of information for statistical and research purposes. 

What is the purpose of data integration undertaken by the ABS? 

The Australian Bureau of Statistics (ABS) only undertakes data integration for statistical and research purposes. This means that the data is used to describe 
characteristics of groups within the population, and relationships that might exist between variables such as social, economic and environmental conditions, 
behaviours and outcomes. The integrated data is not used for regulatory or compliance purposes. 

What is an integrating authority? 

An integrating authority is the single agency ultimately accountable for the implementation of a data integration project undertaken for statistical and research 
purposes. The integrating authority must ensure that risks are assessed, managed and mitigated throughout the duration of the project, in line with the agreed 
requirements of the data custodians who contribute datasets to the project. An integrating authority must be nominated for all data integration projects involving 
Commonwealth data for statistical and research purposes. 

Does my personal information get used in data integration projects? 

Each data integration project has unique requirements for personal information. Some projects use name and address information in order to link records from 


different datasets together, while others use common variables such as student identification numbers or Australian Business Numbers to link records together. 


How is my privacy protected? 


Linked data can be a powerful but sensitive source of information. The ABS has strong safeguards in place to protect identifiable information such as name and 
address, and these have been independently audited. These safeguards are backed by legislation (the Census and Statistics Act 1905 and the Privacy Act 
1988). Only those staff that have a need to view identifiable information as part of their duties have access to it, and only for a limited period of time. No 
information is released by the ABS in such a way that identifiable information compiled through linking can be associated with a specific person. This prohibition 
on release of identifiable information resulting from linking of datasets by the ABS is absolute - extending to all other parts of Government as well as the 
business and research communities. 


More information on the safeguards placed on privacy are available on the ABS website. 


Where can | find information about ABS data integration projects? 

All ABS data integration projects are registered on the online Commonwealth Data Integration Project Public Register under the Integrating Authority view. The 
ABS also publishes information about its data integration projects on the ABS website. A set of publication links is provided in the section below. 

What is the difference between a Statistical Study and a Feasibility or Quality Study? 

A Quality Study (also known as a feasibility study) is an experimental study that tests the ability to link two more datasets together. 


Quality Studies aim: 
e to assess the quality of, and provide a benchmark for, data integration projects that do not make use of name and address; and 


e to assess the data integration methodology to improve processes for future data integration projects. 


Where used, name and address collected for a quality study will be destroyed at the end of the processing period, and the linked datasets created for the 
quality study will be deleted once the purpose of the project has been fulfilled. While some analysis of the data may be published to demonstrate quality, no 
official statistics are published from these studies. 

A Statistical Study links two or more datasets together, with the aim of creating an enhanced dataset to be used for statistical and research purposes. The 
linkage is performed without using name and address information. The data produced from a statistical study are considered to be official statistics for statistical 
and research purposes. 

OTHER INFORMATION AVAILABLE ON DATA INTEGRATION 


Information about data integration involving Commonwealth data is available on the National Statistical Service website. 


Information about the data integration projects using the 2011 Census of Population and Housing can be found in Census Data Enhancement Project: An 


Update, October 2010 (ABS cat. no. 2062.0) 
Results from initial investigations of data integration methods are available through Research Paper: Methodology of Evaluating the Quality of Probabilistic 


Linking (ABS cat. no. 1350.0.55.018) 
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COMMUNITY ATTITUDES TOWARDS ABS STATISTICAL DATA INTEGRATION: FOCUS GROUP STUDY, 2011 


BACKGROUND 


For almost a decade, the Australian Bureau of Statistics, in consultation with the community, has been developing ways to use statistical data integration as a 
method for maximising the use of existing data sources. The ABS is well placed to undertake statistical data integration as it is an organisation which can 
provide a safe and effective environment for acquiring, storing and linking data, as well as skilled analysts who are able to interpret and explain linked data. As 
an accredited Commonwealth Integrating Authority, for all statistical data integration activities involving Commonwealth data, the ABS has demonstrated 
adherence to the High level principles for data integration involving Commonwealth data for statistical and research purposes, and associated governance and 
implementation guidelines. 


In order to ensure that ABS use of statistical data integration is aligned with community views about the collection and dissemination of social statistics, the ABS 
has actively sought feedback from the public. In March and April, 2011, eleven focus groups were conducted in both city and regional areas of New South 
Wales, Victoria, South Australia and Western Australia to canvass community views on data integration for statistical and research purposes. The main purpose 
of these focus groups was to assess public awareness and acceptance of statistical data integration. 

METHODS 

A cross-section of the community was represented in the focus groups, including people of different gender, age, ethnicity, occupation and educational 
attainment. Each group comprised six to nine people and the group discussion lasted about two hours. Each group was told about the concept of data 
integration, given examples to demonstrate the benefits of data integration, and told about the ways in which their personal information is protected. Participants 
were also asked to comment specifically on ABS involvement in data integration. Reactions and issues were discussed progressively as new information was 
presented to the group. 


FINDINGS 


Historically, the public have had a high degree of trust in the ABS to produce high quality statistics and maintain the confidentiality of data providers. The 
feedback from these focus groups reflected that trust. Participants stated that: 


‘Everybody knows it [the ABS]' 
‘It does have proper controls’ 
‘I have little knowledge...but it does have a good reputation’ 


‘It is seen as professional' 


‘It is non-political’ 
‘They are good at what they do...have a good track recora' 
‘It is reputable’ 


In addition to a high level of trust in the ABS generally, there was greater confidence in the development of statistical data integration work if the ABS was a 
leader in its development. This was a common finding across all the groups. Furthermore, some participants pointed out that it was a “natural progression” and 
“bound to happen” and that, as the national statistical agency, the ABS is expected to have a central role in developing statistical data integration methods. For 
the ABS (compared with other government agencies), data collection and analysis was seen as its core business or “job”. 


The risks associated with statistical data integration were discussed by the focus groups, and the High level principles for data integration involving 
Commonwealth data for statistical and research purposes were seen as being strong protection against these risks. As a recognised leader in data collection 
and dissemination, participants felt that the ABS could face potential harm to its strong reputation and level of provider trust if integrating authorities (including 
agencies other than the ABS) made errors or failed to protect confidentiality in relation to linked data. The ABS was seen as largely responsible for the good 
conduct of statistical data integration projects, in a leadership role across Commonwealth government agencies. 


STATISTICAL DATA INTEGRATION IN THE ABS 


In alignment with the findings from these focus groups, and with the High level principles for data integration involving Commonwealth data for statistical and 
research purposes, the ABS has pursued the development of statistical data integration methods carefully and strategically. The Census Data Enhancement 
projects and the Personal Income Tax Data Integration project have been meticulously planned and managed, and are beginning to produce useful and 
interesting results (e.g. ABS cat. no. 1351.0.55.041) which will contribute to policy development in Australia. For more information about statistical data 
integration in the ABS, please refer to www.abs.gov.au/dataintegration. 
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ABS INTEGRATING AUTHORITY ACCREDITATION 


In April 2012, the Australian Bureau of Statistics (ABS) became an accredited Integrating Authority under the Commonwealth data integration interim 
arrangements. A copy of the accreditation claims made by ABS, which have been verified by an independent auditor, is available through the National 
Statistical Service website. 


WHAT IS AN ACCREDITED INTEGRATING AUTHORITY? 

An accredited Integrating Authority is one which has been granted accreditation by the Cross Portfolio Data Integration Oversight Board. Projects which are 
assessed as high risk must be undertaken by an accredited Integrating Authority. 

WHY CHOOSE ABS AS YOUR INTEGRATING AUTHORITY? 

The ABS offers integration services for statistical and research purposes. ABS's vast data holdings cover a comprehensive range of social, economic and 
environmental topics, providing data at all levels from national down to local. This opens up great opportunities to research and explore special population 
groups, and relationships that might exist between variables such as social and economic conditions, behaviours and outcomes. 

The ABS is a highly trusted organisation with the experience, expertise and infrastructure to perform data integration using large and complex data sources. 
Importantly, the Census and Statistics Act 1905 guarantees the confidentiality of data provided to and collected by the ABS, allowing the ABS access to 
sensitive datasets, including some not available to any other Integrating Authority and ensuring their safe handling. This makes the ABS the integrator of choice 


for custodians of highly sensitive data. 


Further information about ABS integrating authority services 


HOW CAN I GET ACCESS TO INTEGRATED DATASETS FROM THE ABS? 


The ABS offers a range of data access options, including statistical tables, various micro-data analysis tools, and a data laboratory facility. The ABS is 
continuing to develop cutting edge approaches to making data accessible while protecting the privacy of individuals and organisations. A list of ABS data 
integration projects is available on the Commonwealth Data Integration Project Register. If you are interested in working with the ABS as an Integrating 


Authority, or wish to access integrated datasets for research or statistical purposes, please email the Data Integration team at data.integration@abs.gov.au. 
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KEY FACTS ABOUT STATISTICAL DATA INTEGRATION 


The Australian Bureau of Statistics (ABS) only undertakes data integration for statistical and research purposes. This means that the data is used to describe 
characteristics of groups within a population, and relationships that might exist between variables such as social and economic conditions, behaviours and 
outcomes. The integrated data is not used for non-statistical purposes such as regulatory or compliance purposes. 


WHY IS STATISTICAL DATA INTEGRATION IMPORTANT? 


Many high priority public policy challenges, such as homelessness, climate change, and crime, do not fit neatly within one ministerial portfolio or a single 
agency’s set of responsibilities. Similarly, the information needed to address complex policy issues and research questions are often spread across a number of 
agencies. 


Analysis of integrated datasets offers valuable opportunities to investigate more complex and expanded policy and research questions than would be possible 
using only separate, unlinked data sources. Integration can produce new official statistics (such as those based on analysis of longitudinal and small area data) 
to inform society. 


Data integration can reduce the need for costly collections by better leveraging existing data to meet current and emerging information requirements. 
Maximising the use of existing data, rather than establishing new collections, avoids additional load on respondents, helps to ensure cost-effectiveness and can 
improve timeliness. Data integration is therefore a key strategy for maximising governments’ investments in existing information assets. 
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CHALLENGES OF STATISTICAL DATA INTEGRATION 


A number of challenges need to be addressed within the National Statistical System (NSS) when undertaking integration. Data integration activities need a 
safe, secure environment, should be performed in line with community expectations and must meet legal and privacy requirements. The paragraphs below 
briefly outline some high level issues for consideration across the NSS. 


Community Acceptance of Statistical Integration 
Infrastructure and Data Management 

Capability and Skills 

Access to Data - Privacy and Legal Considerations 
Confidentiality 

Separation of Identifiers from Content 

Providing Access to Linked Datasets 


Community Acceptance of Statistical Integration 


Qualitative research indicates that most members of the public are supportive of their data being used for statistical and research purposes, to improve social 
and economic outcomes for the Australian community, as long as the data is well managed and confidentiality is maintained. This is particularly true for health 
related information, where most research has been done on community attitudes to data linking. Legal protections to ensure data is kept confidential are 
important in obtaining community acceptance. 


Infrastructure and Data Management 

Data custodians and integrating authorities are equally responsible for managing data through appropriate storage and governance processes. In line with the 
Australian Government’s High level principles for data integration involving Commonwealth data for statistical and research purposes, there should be clearly 
defined procedures for managing data in each organisation. These procedures should cover secure data storage, data access arrangements and data retention 


policies. 


Major costs associated with data integration are usually incurred in the initial set up of information technology and data management infrastructure, and 


transparency measures. These costs will vary depending on existing infrastructure within an organisation. 


Capability and Skills 


Having the appropriate capability to undertake data integration is a challenge for many organisations. Data integration requires analytical skills, but does not 
necessarily require additional specialist skills. In Australia, the limited supply of qualified graduates with analytical skills is a well-known issue and therefore 
developing and maintaining analytical expertise is critical to undertaking data integration activities. 


Access to Data - Privacy and Legal Considerations 


The data management practices of agencies that hold data should include access arrangements to maximise the use of data while upholding privacy and 
legislative requirements. Where practical, obtaining consent from data providers to allow data integration should be considered. Consideration should also be 
given to the primary purpose for which the data was collected. 


Commonwealth operations are covered by the Privacy Act 1988 as well as other specific legislation. Most state and territory government agencies are bound by 
their jurisdictional privacy legislation. Many jurisdictions have governance arrangements with Privacy Commissioners or Information Officers that need to be 
followed prior to accessing data. 


In addition to legislative and privacy considerations, sensitivity of the data also needs to be addressed. Datasets used in integration projects usually contain 
identifiable information about individuals or businesses (for example, name and address) and can include sensitive information such as health or income 
information. Information may also be politically sensitive when relating to government grants or commercial operations. A data management strategy should aim 
to reduce the risk associated with integrating sensitive data. Confidentialising data and implementing the separation principle are examples of strategies that 
can help to reduce risk (See discussion below). 


Confidentiality 


The wealth of information provided by integrated datasets can create additional risk by increasing the chance of identifying an entity (such as a person or 
business). Protecting the confidentiality of individuals or organisations in an integrated dataset is a key element in maintaining the ongoing trust of the 
Australian public. Removing identifying details, such as names, from a dataset does not necessarily protect identity as other variables can be used to deduce 
the identity of an individual or organisation in the dataset. 


Identities can be protected by either confidentialising (e.g. perturbing variables or records), or by restricting access, or some combination of both strategies. 
Protecting the confidentiality and privacy of individuals and organisations also needs to be considered during the actual linking process used to form the 
integrated dataset. 


Protecting the confidentiality and privacy of individuals and organisations also needs to be considered during the actual linking process used to form the 
integrated dataset. 


Separation of Identifiers from Content 


The ABS separates identifying variables from content variables as part of its suite of strategies to protect the identities of individuals and organisations in 
datasets. This means that no-one can see the identifying or demographic information, used to identify which records relate to the same person or organisation 
(e.g. name, address, date of birth), in conjunction with the content data (e.g. clinical information, benefit information, company profits). Instead, staff can see 
only the information they need to do the linking or analysis. So, rather than someone being able to see that John Smith has a rare medical condition, or the 
profits earned by Company X, the person doing the linking sees only the information needed to do the linking (e.g. John Smith’s name and address) and the 
analyst just sees a record, with no identifying information, showing that a person has a rare medical condition together with any other variables needed for 
analysis (e.g. broad age group, sex). 


Providing Access to Linked Datasets 

The wealth of information provided by integrated datasets can create additional risk by increasing the chance of identifying an entity (such as a person or 
business). This risk is increased when providing access to users who may hold some of the data within the linked dataset. The aim is to use integrated datasets 
to their maximum potential, while ensuring the privacy of data providers and maintaining the trust of the general public. 

A range of options are required to provide easy access to datasets which allow both basic and complex analysis and are flexible enough that any software 


package for analysis can be used. 
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WHY CHOOSE THE ABS? 
THE ABS ROLE IN STATISTICAL DATA INTEGRATION 


The Australian Bureau of Statistics Act 1975 (the ABS Act 1975) articulates the functions of the Bureau, which specifically include the function of ensuring the 
coordination of the operations of official bodies in the collection, compilation and dissemination of statistics and related information, with particular regard to: 


= avoiding duplication in the collection by official bodies of information for statistical purposes; 

# attaining compatibility between, and the integration of, statistics compiled by official bodies; and 

= the maximum possible use, for statistical purposes, of information, and means of collection of information, available to official bodies (the ABS Act 1975, 
section 6). 


The ABS Act 1975 gives the ABS the authority to integrate data from a range of sources and to support the maximum usage of these data by official bodies for 
statistical and research purposes. 


STRENGTHS OF ABS IN DATA INTEGRATION 


The ABS has particular strengths in relation to statistical data integration. 


There is a high level of community trust in the ABS, as shown in the results of the 2010 Community Trust in ABS Statistics Survey. Over 90% of those surveyed 
were found to trust or greatly trust the ABS. 


As Australia's national statistical organisation, the ABS has the infrastructure and expertise to enable it to undertake high risk data integration projects. This 
includes specialists in statistical methodology and analysis, technology support, legal and policy advisors and subject matter experts. These specialist areas 
within the ABS support the functional areas undertaking statistical operations, including data integration. 


The existing technical infrastructure and ABS experience in data management provide it with the capacity to support the potentially large and complex files 
associated with high risk integration projects. The ABS already deals with large files, including the Census of Housing and Population data file, which contains 
over 20 million individual records. 


As a national statistical organisation, the ABS has the capability to provide high quality data at a national, state/territory level and for regions. To ensure the 
ABS's impartiality and independence from external influence, the ABS Act 1975 sets out the Statistician’s independence. The ABS has a long history as a 
trusted and respected national statistical agency. Not only has ABS been collecting and disseminating data for over 100 years, but it has been undertaking data 
linking projects using the Census of Population and Housing data since 2006. 


The secrecy provisions of the Census and Statistics Act 1905 offer strong legislative protections. Section 19 of the Census and Statistics Act 1905 forbids past 
or present ABS officers from divulging identifiable information collected under this Act, either directly or indirectly, under penalty of up to 120 penalty units 
(currently $13,200) or imprisonment for two years, or both. The full protection of this Act applies to any datasets brought into the ABS or integrated by the ABS. 
As a Commonwealth agency, the ABS undertakes its operations in accordance with the Privacy Act 1988. 


UNIQUE ABS ADVANTAGES IN DATA INTEGRATION 

The ABS has a wide variety of existing datasets in social, economic and environmental spheres, allowing for cross-portfolio approaches to research. 

There are also some data integration activities that only the ABS is able to undertake. Integration involving data collected under the Census and Statistics Act 
1905 can only be undertaken by the ABS given the ABS’s obligation to ensure the secrecy of this information. This means, for example, that only the ABS 
would be able to conduct data integration activities involving data from the five-yearly Census of Population and Housing. 

In addition, it is likely that some datasets will only be released to the ABS. For example, the ABS is currently the only agency authorised by taxation legislation 


to access identifiable information for statistical and research purposes. 


Return to Data Integration Homepage 


This page first published 8 March 2013 


The ABS objectives in statistical data integration 


Statistical Data Integration ~ 


THE ABS OBJECTIVES IN STATISTICAL DATA INTEGRATION 


In order to progress its work in data integration, the ABS seeks to: 


» Build strong and mutually beneficial relationships with data custodians to: 
e understand opportunities to be gained and policy outcomes to be supported through statistical data integration activities; and 
e establish protocols for ABS access to relevant datasets held by custodians. 


» Integrate important datasets to produce new statistical outputs. New datasets that have a strong potential for linkage include: 
e taxation data; 

electoral data; 

social security and related information; 

data from the Medicare Benefits Scheme; 

data from the Pharmaceutical Benefits Scheme; 

aged care data; 

Valuer General’s data; 

the Australian Immunisation Register; and 

the Research Evaluation Database of social security and related information; amongst others. 


» Explore options to provide a secure and effective environment for data users to be able to query ABS and non-ABS datasets, including integrated 
datasets. Such an environment would support statistical and research requirements by providing appropriate, legislatively supported access to 
confidentialised data through the Remote Execution Environment for Microdata (REEM) and an onsite Data Laboratory. 


EXAMPLE: INDIGENOUS MORTALITY 

Aboriginal and Torres Strait Islander Australians are at a marked disadvantage compared with the rest of the population in a number of areas. Through the 
Council of Australian Governments (COAG), governments have committed to ‘Closing the Gap’ in disadvantage, including closing the gap in life expectancy 
within a generation. The Indigenous Mortality Quality Study, funded by COAG, is designed to produce improved life expectancy estimates for Aboriginal and 
Torres Strait Islander Australians and therefore to support reporting against the COAG target to close the gap in life expectancy. The project links Census data 
with 12 months of death registrations data, a key input for estimating life expectancy, for deaths that occurred after Census night 2011. Once the purpose of 
the project has been fulfilled, all linked datasets will be destroyed as will the names and addresses on the relevant 2011 Census records. 
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